Skip to content

Conversation

@shsms
Copy link
Contributor

@shsms shsms commented Apr 15, 2025

When there are multiple pool instances controlling the same components and one of them stops, that would stop all the pools.

This PR adds an emergency fix to prevent this bug, to buy some time to come up with a better solution.

Copilot AI review requested due to automatic review settings April 15, 2025 16:28
@shsms shsms requested a review from a team as a code owner April 15, 2025 16:28
@shsms shsms requested review from ela-kotulska-frequenz and removed request for a team April 15, 2025 16:28
@github-actions github-actions bot added part:docs Affects the documentation part:data-pipeline Affects the data pipeline labels Apr 15, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (3)

src/frequenz/sdk/timeseries/pv_pool/_pv_pool.py:181

  • Removing this cleanup call might leave pending tasks or open channels. Consider verifying that _pool_ref_store's resources are managed appropriately elsewhere, and document the reasoning for this removal.
await self._pool_ref_store.stop()

src/frequenz/sdk/timeseries/ev_charger_pool/_ev_charger_pool.py:219

  • Ensure that omitting the stop operation on _pool_ref_store doesn't lead to resource leaks. If resource cleanup is handled in another part of the code, an inline comment explaining this would improve clarity.
await self._pool_ref_store.stop()

src/frequenz/sdk/timeseries/battery_pool/_battery_pool.py:401

  • Double-check that the removal of the stop call does not leave orphaned tasks or channels. Consider adding tests or documentation to confirm that resource management remains intact.
await self._pool_ref_store.stop()

Copy link
Contributor

@ela-kotulska-frequenz ela-kotulska-frequenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow but the formulas are shared, too.
So we have the same problem if 2 actors uses the same formula (like BatteryPool.power()) and one of them stops it.
And the same for microgrid.producer().power - one actor stops it, it won't work for other actors.

So this PR won't fix the problem :/
Maybe quick fix would be to not share pools & formulas? (I don't know how much work it would be).
Or have counter (how many is active and how many stopped

Comment on lines 218 to -220
async def stop(self) -> None:
"""Stop all tasks and channels owned by the EVChargerPool."""
await self._pool_ref_store.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add comment why this methods are empty?
It looks like bug now, because method description and definition does different thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@llucax
Copy link
Contributor

llucax commented Apr 16, 2025

👋 reference counting 👋

shsms added 2 commits April 16, 2025 10:14
When there are multiple pool instances controlling the same components
and one of them stops, that would stop all the pools.

This commit adds an emergency fix to prevent this bug, to buy some
time to come up with a better solution.

Signed-off-by: Sahas Subramanian <[email protected]>
Signed-off-by: Sahas Subramanian <[email protected]>
@shsms shsms force-pushed the don't-stop-pools branch from 08ac282 to 6c31fbc Compare April 16, 2025 08:14
Copy link
Contributor

@llucax llucax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving because I guess if you are making a PR with such a hack, you really need it urgently, but I think the release notes are a bit misleading. You fixed a bug by introducing a new bug (pools are not stopped) 😆

@github-project-automation github-project-automation bot moved this from To do to Review approved in Python SDK Roadmap Apr 16, 2025
@shsms
Copy link
Contributor Author

shsms commented Apr 16, 2025

And the same for microgrid.producer().power - one actor stops it, it won't work for other actors.

This is not a problem. Users only want to close the resources that they have created. So when they make a new_receiver() and have a reference to the receiver, they will stop it. The same is true with new_battery_pool because they are creating it.

We don't have the idiom of holding references to built-in formulas. When people create custom formulas, they can stop those; that's fine.

Also, when formulas are closed by mistake, it's immediately noticeable. But when the battery pool is closed by mistake and we're only subscribing to SOC or capacity, for example, it's much more dangerous because we'll just assume the value hasn't changed.

Maybe quick fix would be to not share pools & formulas?

Because there are so many metrics, that would quickly become a noticeable overhead. Additionally, the pools share access to the power manager, etc.

I think we should just look for a proper solution, hopefully next week.

@shsms shsms added this pull request to the merge queue Apr 16, 2025
Merged via the queue into frequenz-floss:v1.x.x with commit df61636 Apr 16, 2025
5 checks passed
@shsms shsms deleted the don't-stop-pools branch April 16, 2025 08:39
@github-project-automation github-project-automation bot moved this from Review approved to Done in Python SDK Roadmap Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:data-pipeline Affects the data pipeline part:docs Affects the documentation

Projects

Development

Successfully merging this pull request may close these issues.

3 participants